5 research outputs found

    The Size Conundrum: Why Online Knowledge Markets Can Fail at Scale

    Full text link
    In this paper, we interpret the community question answering websites on the StackExchange platform as knowledge markets, and analyze how and why these markets can fail at scale. A knowledge market framing allows site operators to reason about market failures, and to design policies to prevent them. Our goal is to provide insights on large-scale knowledge market failures through an interpretable model. We explore a set of interpretable economic production models on a large empirical dataset to analyze the dynamics of content generation in knowledge markets. Amongst these, the Cobb-Douglas model best explains empirical data and provides an intuitive explanation for content generation through concepts of elasticity and diminishing returns. Content generation depends on user participation and also on how specific types of content (e.g. answers) depends on other types (e.g. questions). We show that these factors of content generation have constant elasticity---a percentage increase in any of the inputs leads to a constant percentage increase in the output. Furthermore, markets exhibit diminishing returns---the marginal output decreases as the input is incrementally increased. Knowledge markets also vary on their returns to scale---the increase in output resulting from a proportionate increase in all inputs. Importantly, many knowledge markets exhibit diseconomies of scale---measures of market health (e.g., the percentage of questions with an accepted answer) decrease as a function of number of participants. The implications of our work are two-fold: site operators ought to design incentives as a function of system size (number of participants); the market lens should shed insight into complex dependencies amongst different content types and participant actions in general social networks.Comment: The 27th International Conference on World Wide Web (WWW), 201

    Towards high quality, scalable education: Techniques in automated assessment and probabilistic user behavior modeling

    Get PDF
    There are two primary challenges for instructors in offering a high-quality course at large scale. The first is scaling educational experiences to such a large audience. The second major challenge encountered is that of enabling adaptivity of the educational experience. This thesis addresses both major challenges in the way of high-quality scalable education by developing new techniques for large-scale automated assessment (for addressing scalability) and developing new models for interpretable user behavior analysis in educational environments for improving the quality of interaction via personalized education. Specifically, I perform a study of automated assessment of complex assignments where I explore the effectiveness of different types of features in a feasibility study. I argue for re-framing automated assessment techniques in these more complex contexts as a ranking problem, and provide a systematic approach for integrating expert, peer, and automated assessment techniques via an active-learning-to-rank formulation that outperforms a traditional randomized training solution. I also present the design and implementation of CLaDS---a Cloud-based Lab for Data Science---to enable students to engage with real-world data science problems at-scale with minimal cost ($7.40/student). I discuss our experience with deploying seven major text data assignments for students in both on-campus and online courses and show that the general infrastructure of CLaDS can be used to efficiently deliver a wide range of hands-on data science assignments. Understanding student behavior is necessary for improving the quality of scalable education through adaptivity. To this end, I present two general user behavior models for analyzing student interaction log data to understand student behavior. The first focuses on the discovery and analysis of action-based roles in community question answering (CQA) platforms using a generative model called the MDMM behavior model. I show interesting distinctions within CQA communities in question-asking behavior (where two distinct types of askers can be identified) and answering behavior (where two distinct roles surrounding answers emerge). Second, I find that where there are statistically significant differences in health metrics across topical groups on StackExchange, there are also statistically significant differences in behavior compositions, suggesting a relationship between behavior composition and health. Third, I show that the MDMM behavior model can be used to demonstrate similar but distinct evolutionary patterns between topical groups. The second model focuses on discovering temporal action patterns of learners in Coursera MOOCs. I present a two-layer hidden Markov model (2L-HMM) to extract a multi-resolution summary of user behavior patterns and their evolution, and show that these patterns can be used to extract latent features that correlate with educational outcomes. Finally, I develop the Piazza Educational Role Mining (PERM) system to close the gap between theory and practice by providing an easy-to-use web-based interface for leveraging probabilistic user behavior models on Piazza CQA interaction data. PERM allows instructors to easily crawl their courses and run subsequent MDMM behavior analyses on them. Analyses provide instructors with insight into the common user behavior patterns (roles) uncovered by plotting their action distributions in a browser. PERM enables instructors to perform deep-dives into an individual role by viewing the concrete sessions that have been assigned a specific role by the model, along with each session's individual actions and associated content. This allows instructors to flexibly combine data-driven statistical inference (through the MDMM behavior model) with a qualitative understanding of the behavior within a role. Finally, PERM develops a model of individual users as mixtures over the discovered roles, which instructors can also deep-dive into to explore exactly what individual users were doing on the platform

    A Generative Model for Discovering Action-Based Roles and Community Role Compositions on Community Question Answering Platforms

    No full text
    This paper proposes a generative model for discovering user roles and community role compositions in Community Question Answering (CQA) platforms. While past research shows that participants play different roles in online communities, automatically discovering these roles and providing a summary of user behavior that is readily interpretable remains an important challenge. Furthermore, there has been relatively little insight into the distribution of these roles between communities. Does a community’s composition over user roles vary as a function of topic? How does it relate to the health of the underlying community? Does role composition evolve over time? The generative model proposed in this paper, the mixture of Dirichlet-multinomial mixtures (MDMM) behavior model can (1) automatically discover interpetable user roles (as probability distributions over atomic actions) directly from log data, and (2) uncover community-level role compositions to facilitate such cross-community studies. A comprehensive experiment on all 161 non-meta communities on the StackExchange CQA platform demonstrates that our model can be useful for a wide variety of behavioral studies, and we highlight three empirical insights. First, we show interesting distinctions in question-asking behavior on StackExchange (where two distinct types of askers can be identified) and answering behavior (where two distinct roles surrounding answers emerge). Second, we find statistically significant differences in behavior compositions across topical groups of communities on StackExchange, and that those groups that have statistically significant differences in health metrics also have statistically significant differences in behavior compositions, suggesting a relationship between behavior composition and health. Finally, we show that the MDMM behavior model can be used to demonstrate similar but distinct evolutionary patterns between topical groups
    corecore